Hao Tang

北京大学助理教授/研究员博士生导师博雅青年学者/未名青年学者/智源青年学者
国家级海外高水平人才计划入选者国家优秀留学生奖（归国类）IEEE (RAS) Member
Office: 5 Yiheyuan Road, Haidian District, Beijing, 100871, China🇨🇳

Hey, thanks for stopping by! 👋

I am a tenure-track Assistant Professor (Ph.D. supervisor) at the School of Computer Science, Peking University, China🇨🇳, where I lead the Embodied and Generative Intelligence Lab and conduct research in computer vision, machine learning, robotics, and artificial intelligence. Previously, I held postdoctoral positions at both CMU (Robotics Institute), USA🇺🇸, and ETH Zürich (Computer Vision Lab), Switzerland🇨🇭. My academic journey includes earning a master's degree from Peking University, China🇨🇳, and completing my Ph.D. (cum laude) at University of Trento, Italy🇮🇹. Additionally, I had the privilege of working and visiting several institutions, including University of Oxford (UK🇬🇧), MIT (USA🇺🇸), Harvard University (USA🇺🇸), IIAI (UAE🇦🇪), Northeastern University (USA🇺🇸), NUS (Singapore🇸🇬), NTU (Singapore🇸🇬), University of Michigan (USA🇺🇸), UCLA (USA🇺🇸), UPenn (USA🇺🇸), IMT Nord Europe (France🇫🇷), INSAIT (Bulgaria🇧🇬), HKU (Hong Kong🇭🇰), and so on.

Beyond academia, I have also had the honor of serving as senior technical consultants for numerous AI startups, including those in USA🇺🇸, UK🇬🇧, Romania🇷🇴, and China🇨🇳, with technologies ranging from efficient AI to 3D vision to AIGC to AI4Blockchain, etc.

News & Events

Hiring!	We're hiring Postdoc/Ph.D./Master/Intern researchers on Generative AI, World Model, Spatial Intelligence, and Embodied AI for our PKU lab, feel free to reach out to me directly.
2026-06	We have 6 papers (Self-Evolving VLA + Mobile Robot VLA + Video Spatial Reasoning + Physics-Aware Video Generation + 4D Gaussian Splatting + Motion Generation) accepted to ECCV 2026.
2026-05	We have 1 paper (Invariant Structure Learning) accepted to KDD 2026, 2 papers (4D World Generation + RL Position Paper) accepted to ICML 2026, 2 papers (Cross Domain Test Time Scaling + Post-training Quantization) accepted to IJCAI 2026, and 1 paper (Medical Robotic Perception) accepted to ICIP 2026.
2026-04	We have 1 paper (Human Reaction Generation) accepted to FG 2026, and 1 paper (Arbitrary Style Transfer) accepted to TMM 2026.
2026-03	I was invited to serve a Area Chair (AC) at NeurIPS 2026, and we have 1 paper (Facial Reactions Generation) accepted to TAFFC 2026.
2026-02	We have 4 papers including 1 highlight (Versatile Dental MLLM + HOI Video Generation + Panoramic X-ray Analysis + 3D Morphing) accepted to CVPR 2026, 1 paper (Text to 3D Avatars) accepted to TVCG 2026, and 1 paper (LLM Agents) accepted to AAMAS 2026.
2026-01	We have 1 paper (All-in-One Image Restoration) accepted to TPAMI 2026, 4 papers including 1 oral (3D VLM on Ancient Greek Pottery + LVLM Hallucination + Brain-Inspired Stereo Depth Estimation + Human Video Generation) accepted to ICLR 2026, and 1 paper (Stereo Depth Estimation in Underwater Scenes) accepted to ICRA 2026.
2025-12	We have 1 paper (Task-Aware Mixture-of-Experts) accepted to AAMAS 2026.
2025-11	We have 2 papers (Multi-Task Adaptation + Diffusion Quantization) accepted to AAAI 2026, 2 papers (3D Captioning + Robot Modeling) accepted to 3DV 2026, 1 paper (Cellular Phenotype Transition) accepted to AAAI 2026 Bridge AIMedHealth and 1 paper (Molecular Representation) accepted to AAAI 2026 Workshop AIDD.
2025-10	Honored to be invited by NVIDIA to share our latest research and vision on Shaping the Future with Generative and Embodied Intelligence, and we have 1 paper (Survey about Multimodal Alignment and Fusion) accepted to IJCV 2025.
2025-09	I was elected as one of the World's Top 2% Scientists in 2025 by Stanford University, and we have 1 paper (Articulation and Diffusion for Robot Modeling) accepted to CoRL 2025 LSRW Workshop, 4 papers including 1 spotlight (Parameter Efficient Merging for MLLM + Diffusion-based Adversarial Attacks + Spatial Adversarial Alignment + Dental AI) accepted to NeurIPS 2025, and 1 paper (VLM for Open-Vocabulary Segmentation) accepted to CVIU 2025.
2025-08	I was invited to serve a Area Chair (AC) at ICLR 2026, and we have 1 paper (VLA for Multi-Task Manipulation) accepted to CoRL 2025.
2025-07	I was invited to serve a Senior Program Committee (SPC) at AAAI 2026, and we have 1 paper (Music-Guided Dance Video Synthesis) accepted to TPAMI 2025, 1 paper (Video Anomaly Detection) accepted to ACM MM 2025.
2025-06	We have 1 paper (Cellular Phenotypic Transdifferentiation) accepted to ICML 2025 Workshop, 1 paper (Monocular Depth Estimation) accepted to TCSVT 2025, 2 oral papers (Surgical Robot + Reconstruction and Editing in V2X Scenarios) accepted to IROS 2025, and 1 paper (Medical Image Segmentation) accepted to ICCV 2025.
2025-05	I was invited to serve on the Editorial Board of Discover Artificial Intelligence (a journal by Springer Nature), I was also invited to serve as an Area Chair (AC) at EMNLP 2025, and we have 1 paper (Real-Time ViT on Mobile) accepted to IJCV 2025.
2025-04	We have 3 papers (Single-Step Image SR + In-Context Meta LoRA + Sparse MoE) accepted to IJCAI 2025, 1 paper (Synergistic Immunotherapy in Glioma) accepted to Advanced Science 2025, 1 paper (Continual Gesture Learning) accepted to IJCNN 2025, and 1 paper (Fake News Video Detection) accepted to TMM 2025.
2025-03	I was invited to serve as an Area Chair (AC) at ACM MM 2025, and we have 1 paper (Accident Warning Agent) accepted to IV 2025.
2025-02	Honored to be invited by NVIDIA to share our latest research and vision on Bridging the Gap to Fault-Tolerant Quantum Computing, and I was invited to serve as an Area Chair (AC) at ACL 2025, a Senior Program Committee (SPC) at IJCAI 2025, and we have 3 papers including 1 oral (Mamba for Image Compression + 4D Reconstruction + Diffusion Fourier Neural Operator) accepted to CVPR 2025.
2025-01	We have 1 paper (SAR Automatic Target Recognition) accepted to TAES 2025, 1 paper (Person Image Generation) accepted to TPAMI 2025, 1 paper (Explainability in MLLMs) accepted to NAACL 2025 Main Conference, and 1 paper (Urological Surgical Robots) accepted to ICRA 2025.
2024-12	We have 3 papers (Structured Pruning for LLM + FG-SBIR + Hair Transfer via Diffusion Model) accepted to AAAI 2025 and 1 paper (Efficient Fine-Tuning of LLM) accepted to ICASSP 2025.
2024-11	We have 1 paper (Virtual Try-On) accepted to TMM 2024.
2024-10	We have 1 paper (Quantization on Bird's-Eye View Representation) accepted to WACV 2025 and 1 paper (Semantic Segmentation on Autonomous Vehicles Platform) accepted to TCAD 2024.
2024-09	I was elected as one of the World's Top 2% Scientists in 2024 by Stanford University, and we have 1 paper (Camera-Agnostic Attack) accepted to NeurIPS 2024 and 1 paper (Medical Image Segmentation) accepted to ACCV 2024.
2024-08	I was invited as a speaker at the 2nd Workshop & Challenge on Micro-gesture Analysis for Hidden Emotion Understanding (MiGA) at IJCAI 2024, and we have 2 papers (Guided Image Translation + 3D Human Pose Estimation) accepted to PR 2024.
2024-07	We have 6 papers (Motion Mamba + Dataset Growth + Story Visualization and Completion + Diffusion Model for Semantic Image Synthesis + Generalizable Image Editing + 3D Semantic Segmentation) accepted to ECCV 2024, 1 paper (Survey about Physical Adversarial Attack) accepted to TPAMI 2024, and 2 papers (Talking Head Avatar + Story Visualization and Continuation) accepted to ACM MM 2024.
2024-06	I joined Peking University as an Assistant Professor.
2024-04	I received offers from MIT and Harvard University.
2024-02	We have 7 papers (Explanation for ViT + Faithfulness of ViT + Diffusion Policy for Versatile Navigation + Subject-Driven Generation [Final rating: 455] + Diffusion Model for 3D Hand Pose Estimation + Adversarial Learning for 3D Pose Transfer + Efficient Diffusion Distillation [224->235]) accepted to CVPR 2024.
2024-01	We have 1 paper (Architectural Layout Generation) accepted to TPAMI 2024.
2023-12	We have 1 paper (Sign Pose Sequence Generation) accepted to AAAI 2024.
2023-10	I was elected as one of the World's Top 2% Scientists in 2023 by Stanford University and we have 4 papers (BEV Perception + Efficient ViT + 3D Motion Transfer + Graph Distillation) accepted to NeurIPS 2023.
2023-09	We have 1 paper (Practical Blind Image Denoising) accepted to MIR 2023 and 1 paper (Diffusion Model for HDR Deghosting) accepted to TCSVT 2023.
2023-08	I received an offer from CMU.
2023-07	We have 1 paper (Semantic Image Synthesis) accepted to TPAMI 2023.
2023-06	We have 1 paper (Visible-Infrared Person Re-ID) accepted to ICCV 2023.
2023-05	We have 2 papers (Image Restoration Dataset + 3D-Aware Video Generation) accepted to CVPRW 2023 and 1 paper (3D Face Generation) accepted to JSTSP 2023.
2023-04	We have 1 paper (Speed-Aware Object Detection) accepted to ICML 2023, 2 papers (Lottery Ticket Hypothesis for ViT + Zero-shot Character Recognition) accepted to IJCAI 2023, 1 paper (3D Human Pose Estimation) accepted to PR 2023, and 1 paper (SAR Target Recognition) accepted to TGRS 2023.
2023-03	We have 6 papers (HDR Deghosting + Point Cloud Registration + Graph-Constrained House Generation + Mathematical Architecture Design + Text-to-Image Synthesis + Efficient Semantic Segmentation) accepted to CVPR 2023.
2023-02	We have 3 papers (Camouflaged Object Detection + Brain Vessel Image Segmentation + Cross-View Image Translation) accepted to ICASSP 2023 and 1 paper (Camouflaged Object Detection) accepted to TCSVT 2023.
2023-01	We have 1 paper (Semantic Image Synthesis) accepted to ICLR 2023 and 1 paper (Human Reaction Generation) accepted to TMM 2023.
2022-11	We have 4 papers (Real-Time Segmentation + Wearable Design + Efficient ViT Training + Text-Guided Image Editing) accepted to AAAI 2023, 1 paper accepted (Person Pose and Facial Image Synthesis) to IJCV 2022, 1 paper (Salient Object Detection) accepted to TIP 2022, and 1 paper (Object Detection Transformer) accepted to TCSVT 2022.
2022-10	We have 1 paper (Sinusoidal Neural Radiance Fields) accepted to BMVC 2022 and 1 paper (Guided Image-to-Image Translation) accepted to TPAMI 2022.
2022-09	We have 1 paper (Facial Expression Translation) accepted to TAFFC 2022 and 1 paper (Ship Detection) accepted to TGRS 2022.
2022-07	We have 5 papers (Real-Time SR + Video SR + Soft Token Pruning for ViT + 3D-Aware Human Synthesis + Video Semantic Segmentation) accepted to ECCV 2022, 1 paper (Gaze Correction and Animation) accepted to TIP 2022, and 1 paper (Cross-view Panorama Image Synthesis) accepted to PR 2022.
2022-06	We have 2 papers (Character Image Restoration + Character Image Denoising) accepted to ACM MM 2022.
2022-04	We have 1 paper (Real-Time Portrait Stylization) accepted to IJCAI 2022, 1 paper (Wide-Context Transformer for Semantic Segmentation) accepted to TGRS 2022, and 1 paper (Incremental Learning for Semantic Segmentation) accepted to TMM 2022.
2022-03	We have 5 papers including 1 oral (Text-to-Image Synthesis + 3D Human Pose Estimation + Text-Driven Image Manipulation + 3D Face Modeling + 3D Face Restoration) accepted to CVPR 2022, 1 paper (Image Generation) accepted to TPAMI 2022, and 1 paper (Cross-View Panorama Image Synthesis) accepted to TMM 2022.
2021-12	We have 2 papers (Generalized 3D Pose Transfer + Audio-Visual Speaker Tracking) accepted to AAAI 2022.
2021-11	We have 1 paper (Building Extraction in VHR Remote Sensing Images) accepted to TIP 2021.
2021-10	We have 3 papers (Cross-View Image Translation + Data-driven 3D Animation + Natural Image Matting) accepted to BMVC 2021.
2021-08	We have 1 paper (Layout-to-Image Translation) accepted to TIP 2021 and 1 paper (Unpaired Image-to-Image Translation) accepted to TNNLS 2021.
2021-07	We have 2 papers (Continuous Pixel-Wise Prediction + Unsupervised 3D Pose Transfer) accepted to ICCV 2021.
2021-06	We have 1 paper (Cross-View Exocentric to Egocentric Video Synthesis) accepted to ACM MM 2021 and 1 paper (Total Generate) accepted to TMM 2021.
2021-05	I received an offer from ETH Zurich.
2020-09	I received an offer from IIAI.
2020-08	We have 1 paper (Person Image Generation) accepted to BMVC 2020, 2 papers (Semantic Image Synthesis + Unsupervised Gaze Correction and Animation) accepted to ACM MM 2020, and 1 paper (Controllable Image-to-Image Translation) accepted to TIP 2020.
2020-07	We have 1 paper (Person Image Generation) accepted to ECCV 2020.
2020-05	We have 1 paper (Deep Dictionary Learning and Coding) accepted to TNNLS 2020 and 1 paper (Semantic Segmentation of Remote Sensing Images) accepted to TGRS 2020.
2020-02	We have 1 paper (Semantic-Guided Scene Generation) accepted to CVPR 2020.
2019-07	We have 1 paper (Keypoint-Guided Image Generation) accepted to ACM MM 2019.
2019-05	I received an offer from University of Oxford.
2019-02	We have 1 paper (Cross-View Image Translation) accepted to CVPR 2019.
2018-06	We have 1 paper (Hand Gesture-to-Gesture Translation) accepted to ACM MM 2018.
2018-02	We have 1 paper (Monocular Depth Estimation) accepted to CVPR 2018.
2016-07	We have 1 paper (Large Scale Image Retrieval) accepted to IJCAI 2016.
2015-08	We have 1 paper (Gender Classification) accepted to ACM MM 2015.

Position Openings

For prospective collaborators interested in Generative AI, World Model, Spatial Intelligence, Embodied AI, and AI4Quantum, we are offering multiple positions for highly motivated Postdoc / Ph.D. / Master / RA / externship / internship / visiting students. If you are interested in joining our group, please email me with your self-introduction, the project of interest (including the problem you are trying to solve and how you plan to solve it, being as specific as possible), your transcript, and CV to haotang@pku.edu.cn / bjdxtanghao@gmail.com. I'm sorry that I may not be able to respond to every email, but I assure you that your message will stand out if you have a strong research background.

For Ph.D./Postdoc/Master applicants, we have several openings for domestic students each year. Please reach out at least 6 months prior to the application deadline. For international students, PKU CS offers a variety of programs in English, including Master's, Ph.D. programs, Summer/Winter Schools, and various other options. Feel free to reach out if you are interested or have any questions. For RA/externship/internship/visiting students, we welcome undergraduate and graduate students from all over the world to apply for >6 months research internship. Our RAs/interns/visitors have published many top-tier conference/journal papers (e.g., TPAMI, CVPR, NeurIPS) and have been admitted to Postdoc/Ph.D./Master programs in prestigious institutions such as MIT, Harvard, UCB, Google, Brown University, UMich, University of Toronto, Caltech, UCSC, ETH Zürich, NTU, NUS, TUM, PolyU, etc.

Research Lab

The mission of our research lab is to harness AI to address real-world challenges by bring the gap between digital generation and physical interaction. Our research priorities include Generative AI, World Model, Spatial Intelligence, Embodied AI, and AI4Quantum.

Mingju Gao (PhD, previously from ICT, CAS, China🇨🇳)
Yuchen Guan (PhD, previously from Tsinghua University, China🇨🇳)
Qilin Wang (PhD, previously from Fudan University, China🇨🇳)
Siyuan Qian (PhD, w/ Shanghang Zhang, previously from BUAA, China🇨🇳)
Haoyu Wang (PhD, w/ Shiliang Zhang, previously from HIT, China🇨🇳)
Zhen Chen (PhD, w/ Shiliang Zhang, previously from Tongji University, China🇨🇳)
Derek Zeng (Master, previously from University of Waterloo, Canada🇨🇦)
Jiarui Ye (Master, previously from NUAA, China🇨🇳)
Xiaoyuan Wang (Visiting from CMU, USA🇺🇸)
Haozhan Tang (Visiting from CMU, USA🇺🇸)
Wenbo Gou (Visiting from CMU, USA🇺🇸)
Zhenyu Lu (Visiting from CMU, USA🇺🇸)
Jia Huang (Visiting from Columbia University, USA🇺🇸)
Guancheng Lu (Visiting from Northwestern University, USA🇺🇸)
Rohan Siva (Visiting from UT Austin, USA🇺🇸)
Jun Liu (Visiting from NEU, USA🇺🇸)
Changdi Yang (Visiting from NEU, USA🇺🇸)
Na Li (Visiting from Goldman Sachs & UPenn, USA🇺🇸)
Yuanzhe Liu (Visiting from UPenn, USA🇺🇸)
Zihao Wang (Visiting from UPenn, USA🇺🇸)
Yao Gong (Visiting from UPenn, USA🇺🇸)
Junjie Zeng (Visiting from UMich, USA🇺🇸)
Yuling Feng (Visiting from UMich, USA🇺🇸)
Haoyu Cheng (Visiting from UCSD, USA🇺🇸)
Peng Huang (Visiting from Boston University, USA🇺🇸)
Xiaoyi Liu (Visiting from Washington University in St. Louis, USA🇺🇸 -> now Ph.D. at Brown University, USA🇺🇸)
Kang Chen (Visiting from Rensselaer Polytechnic Institute, USA🇺🇸)
Linxi Wu (Visiting from University of North Carolina at Chapel Hill, USA🇺🇸)
Bin Xie (Visiting from IIT, USA🇺🇸)
Huixiu Jiang (Visiting from IIT, USA🇺🇸)
Zitong Zhang (Visiting from University of Louisville, USA🇺🇸)
Haodong Lu (Visiting from University of Toronto, Canada🇨🇦)
Wanru Cheng (Visiting from University of Toronto, Canada🇨🇦)
Federico Lin (Visiting from EPFL, Switzerland🇨🇭)
Peize Li (Visiting from KCL, UK🇬🇧)
Jingyi Wan (Visiting from University of Cambridge, UK🇬🇧)
Xuanyu Lai (Visiting from ICL, UK🇬🇧)
Yitong Luo (Visiting from ICL, UK🇬🇧 -> now Ph.D. at ICL, UK🇬🇧)
Haitao He (Visiting from QMUL, UK🇬🇧)
Baohua Yin (Visiting from University of Sussex, UK🇬🇧)
Zhuoran Wang (Visiting from Delft University of Technology, Netherlands🇳🇱)
Enze Wang (Visiting from Technical University of Munich, Germany🇩🇪)
Yitao Song (Visiting from Moscow State University, Russia🇷🇺)
Yingzhe Shao (Visiting from NTU, Singapore🇸🇬)
Zhiguang Han (Visiting from NTU, Singapore🇸🇬)
Xinhua Ma (Visiting from NTU, Singapore🇸🇬)
Zhen Long (Visiting from NUS, Singapore🇸🇬)
Ali Haider (Visiting from Kyung Hee University, South Korea🇰🇷)
Amitoj Singh Miglani (Visiting from IIT Roorkee, India🇮🇳)
Pirzada Suhail (Visiting from IIT Bombay, India🇮🇳)
Siddhant Pathak (Visiting from IIT BHU, India🇮🇳)
Vrushank Ajay Ahire (Visiting from IIT Ropar, India🇮🇳)
Mohammed Shehin (Visiting from NIT Calicut, India🇮🇳)
Ziwei Li (Visiting from KAUST, Saudi Arabia🇸🇦)
Ahmad Imran (Visiting from NUST, Pakistan🇵🇰)
Zeyu Zhang (Visiting from Australian National University, Australia🇦🇺)
Hongpeng Wang (Visiting from University of Sydney, Australia🇦🇺)
Jinqi Liao (Visiting from University of Sydney, Australia🇦🇺)
Zeyu Ren (Visiting from University of Melbourne, Australia🇦🇺)
Haihang Wu (Visiting from University of Melbourne, Australia🇦🇺)
Ziang Li (Visiting from UTS, Australia🇦🇺)
Zhixing Wang (Visiting from University of Malaya, Malaysia🇲🇾)
Pakawat Phasook (Visiting from King Mongkut’s University of Technology Thonburi, Thailand🇹🇭)
Ahmed Eldaw Mohamed (Visiting from University of Cape Town, South Africa🇿🇦)
Zhiyu Zhou (Visiting from Chinese University of Hong Kong, Hong Kong🇭🇰)
Hongfeng Lai (Visiting from University of Hong Kong, Hong Kong🇭🇰)
Yuxin Cheng (Visiting from University of Hong Kong, Hong Kong🇭🇰)
Zicheng Liu (Visiting from University of Hong Kong, Hong Kong🇭🇰)
Yihua Shao (Visiting from City University of Hong Kong, Hong Kong🇭🇰)
Yuxuan Fan (Visiting from HKUST (Guangzhou), China🇨🇳 -> now Ph.D. at NTU, Singapore🇸🇬)
Ruoxiang Huang (Intern from Peking University, China🇨🇳)
Nonghai Zhang (Intern from Peking University, China🇨🇳)
Dongjian Li (Intern from Peking University, China🇨🇳)
Rui Yang (Intern from Peking University, China🇨🇳)
Mingyu Li (Intern from Peking University, China🇨🇳)
Zhaohui Wang (Intern from Peking University, China🇨🇳)
Ziyan Mao (Intern from Peking University, China🇨🇳)
Xinran Kuang (Intern from Peking University, China🇨🇳)
Keyu Chen (Intern from Peking University, China🇨🇳)
Di Yu (Visiting from Tsinghua University, China🇨🇳)
Zhengxing Lei (Visiting from Zhejiang University, China🇨🇳)
Yuxuan Zhang (Visiting from Shanghai Jiao Tong University, China🇨🇳 -> now Ph.D. at CUHK, Hong Kong🇭🇰)
Renkai Wu (Visiting from Shanghai Jiao Tong University -> now Ph.D. at Tsinghua University, China🇨🇳)
Junxian Li (Visiting from Shanghai Jiao Tong University, China🇨🇳)
Jiaxing Zhang (Visiting from Sichuan University -> now Ph.D. at Shanghai Jiao Tong University, China🇨🇳)
Hui Wei (Visiting from Wuhan University, China🇨🇳 -> now Postdoc at University of Oulu, Finland🇫🇮)
Xiaofeng Zhang (Visiting from Shanghai Jiao Tong University, China🇨🇳)
Ting Huang (Visiting from Shanghai University of Engineering Science, China🇨🇳)
I-Tak Ieong (Visiting from Tongji University, China🇨🇳)
Kunze Jiang (Visiting from USTC, China🇨🇳)
Lei Xin (Visiting from Wuhan University, China🇨🇳)
Fanhu Zeng (Visiting from CAISA, China🇨🇳)
Songtao Li (Visiting from Northeastern Universiyty, China🇨🇳) -> now Ph.D. at PolyU, Hong Kong🇭🇰)
Jiawei Mao (Visiting from Hangzhou Dianzi University, China🇨🇳 -> now Ph.D. at UCSC, USA🇺🇸)
Qinhua Xie (Visiting from East China Normal University, China🇨🇳)
Zihang Liu (Visiting from Beijing Institute of Technology, China🇨🇳)
Aoming Liang (Visiting from Westlake University, China🇨🇳)
Sifan Li (Visiting from Liaoning University, China🇨🇳)

Former members and visitors:

Bowei Zhang (RA from Peking University, China🇨🇳 -> now Researcher at DeepSeek, China🇨🇳)
Youran Qu (RA from Peking University, China🇨🇳 -> now Master at Dartmouth College, USA🇺🇸)
Kaiwen Shi (RA from Peking University, China🇨🇳 -> now PhD at University of Notre Dame, USA🇺🇸)
Lujing Xie (RA from Peking University, China🇨🇳 -> now PhD at University of Texas at Dallas, USA🇺🇸)
Yaowu Zhang (RA from Peking University, China🇨🇳 -> now Master at Institute of Computing Technology, Chinese Academy of Sciences, China🇨🇳)
Haoran Li (RA from Peking University, China🇨🇳 -> now Master at Peking University, China🇨🇳)
Yaoxiang Xiong (RA from Peking University, China🇨🇳 -> now Master at Peking University, China🇨🇳)
Jinxian Ren (RA from Peking University, China🇨🇳)
Guillaume Thiry (RA from ETH Zürich, Switzerland🇨🇭 -> now Software Engineer at Google, Switzerland🇨🇭)
Sherwin Bahmani (RA from ETH Zürich, Switzerland🇨🇭 -> now Ph.D. at University of Toronto, Canada🇨🇦)
Sanghwan Kim (RA from ETH Zürich, Switzerland🇨🇭 -> now Ph.D. at TUM, Germany🇩🇪)
Alexandros Delitzas (RA from ETH Zürich, Switzerland🇨🇭 -> now Ph.D. at ETH Zürich and Max Planck Institute for Informatics, Switzerland🇨🇭 and Germany🇩🇪)
Jingfeng Rong (RA from ETH Zürich, Switzerland🇨🇭 -> now Ph.D. at Swiss Finance Institute, Switzerland🇨🇭)
Yitong Xia (RA from ETH Zürich, Switzerland🇨🇭 -> now Ph.D. at NTU, Singapore🇸🇬)
Boyan Duan (RA, now Master at ETH Zürich, Switzerland🇨🇭)
Baptiste Chopin (RA, now Postdoc at INRIA, France🇫🇷)
Chenyang Gu (RA from Peking University, China🇨🇳 -> now Ph.D. at Peking University, China🇨🇳)
Kosta Gjorgjievski (RA from UCLA, USA🇺🇸 -> now Master at Tsinghua University, China🇨🇳)
Xiaoyu Yi (RA from Peking University, China🇨🇳)

Teaching

2026 Spring, PKU: Introduction to Generative Artificial Intelligence
2026 Spring, PKU: Video Encoding and Understanding
2025 Fall, PKU: Deep Learning and Large Models
2025 Spring, PKU: Deep Generative Models
2025 Spring, PKU: Video Encoding and Understanding

Guiding

International Collaborations

Our lab maintains strong collaborative relationships with several leading international research institutions, including

USA🇺🇸: MIT, Harvard, Stanford University, CMU, Princeton University, UIUC, UMich, Northeastern University, University of Maryland, University of Texas at Austin, UC Irvine, University of Illinois at Chicago, Illinois Institute of Technology, University of Connecticut, Texas State University, University of Georgia, Clemson University, University of Oregon, College of William & Mary
Canada🇨🇦: University of Toronto, Simon Fraser University
Switzerland🇨🇭: ETH Zürich, EPFL
UK🇬🇧: University of Oxford, University of Cambridge, University of Leicester, University of Warwick
Italy🇮🇹: University of Trento, FBK, Politecnico di Milano, University of Modena e Reggio Emilia
Germany🇩🇪: TUM, University of Würzburg
France🇫🇷: INRIA, University of Lille
Finland🇫🇮: University of Oulu
Netherlands🇳🇱: TU Delft
Belgium🇧🇪: KU Leuven
Bulgaria🇧🇬: INSAIT
Singapore🇸🇬: NUS, NTU
Japan🇯🇵: University of Tokyo, National Institute of Informatics
South Korea🇰🇷: Sungkyunkwan University
Australia🇦🇺: University of Adelaide, ANU, Monash University, University of Technology Sydney
UAE🇦🇪: IIAI, MBZUAI
HongKong🇭🇰: University of Hong Kong, Hong Kong University of Science and Technology

I am deeply grateful for the opportunities to collaborate with such esteemed institutions and for the valuable contributions they have made to our joint research efforts. Additionally, we maintain long-term collaborations with industry, including Google, Meta, Amazon, Cisco, Western Digital, Mercedes-Benz, Xiaohongshu, Alibaba, Tencent, etc, aiming to translate cutting-edge research into practical applications and drive technological advancement.

Featured Publications

(Including CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, AAAI, IJCAI, ACM MM, ICRA, IROS, CoRL, NAACL, 3DV, AAMAS, TPAMI, IJCV, TVCG)

^*Corresponding Author(s)

Arxiv

Fidelity-Aware Data Composition for Robust Robot Generalization

Zizhao Tong, Di Chen, Sicheng Hu, Hongwei Fan, Liliang Chen, Guanghui Ren, Hao Tang, Hao Dong, Ling Shao

In Arxiv, 2025

PDF Code
Arxiv

UniVid: The Open-Source Unified Video Model

Jiabin Luo, Junhui Lin, Zeyu Zhang, Biao Wu, Meng Fang, Ling Chen, Hao Tang*

In Arxiv, 2025

PDF Code
Arxiv

Nav-R1: Reasoning and Navigation in Embodied Scenes

Qingxiang Liu, Ting Huang, Zeyu Zhang, Hao Tang*

In Arxiv, 2025

PDF Code
Arxiv

3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding

Ting Huang, Zeyu Zhang, Hao Tang*

In Arxiv, 2025

PDF Code
Arxiv

FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation

Guohao Huo, Ruiting Dai, Ling Shao, Hao Tang*

In Arxiv, 2025

PDF Code
Arxiv

RFMedSAM 2: Automatic Prompt Refinement for Enhanced Volumetric Medical Image Segmentation with SAM 2

Bin Xie, Hao Tang, Yan Yan, Gady Agam

In Arxiv, 2025

PDF Code
Arxiv

Self-Prompt SAM: Medical Image Segmentation via Automatic Prompt SAM Adaptation

Bin Xie, Hao Tang, Dawen Cai, Yan Yan, Gady Agam

In Arxiv, 2025

PDF Code
Arxiv

UDiTQC: U-Net-Style Diffusion Transformer for Quantum Circuit Synthesis

Zhiwei Chen, Hao Tang*

In Arxiv, 2025

PDF Code
Arxiv

Artificial Intelligence for Quantum Error Correction: A Comprehensive Review

Zihao Wang, Hao Tang*

In Arxiv, 2024

PDF Code Media
Arxiv

PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model

Yuqing Wang, Zhongling Huang, Shuxin Yang, Hao Tang, Xiaolan Qiu, Junwei Han, Dingwen Zhang

In Arxiv, 2024

PDF Code
Arxiv

Artificial Intelligence for Central Dogma-Centric Multi-Omics: Challenges and Breakthroughs

Lei Xin, Caiyun Huang, Hao Li, Shihong Huang, Yuling Feng, Zhenglun Kong, Zicheng Liu, Siyuan Li, Chang Yu, Fei Shen, Hao Tang*

In Arxiv, 2024

PDF Code
Arxiv

Text-to-Image Synthesis: A Decade Survey

Nonghai Zhang, Hao Tang*

In Arxiv, 2024

PDF Code
Arxiv

KMM: Key Frame Mask Mamba for Extended Motion Generation

Zeyu Zhang, Hang Gao, Akide Liu, Qi Chen, Feng Chen, Yiran Wang, Danning Li, Hao Tang*

In Arxiv, 2024

PDF Code
Arxiv

GWQ: Gradient-Aware Weight Quantization for Large Language Models

Yihua Shao, Siyu Liang, Xiaolin Lin, Zijian Ling, Zixian Zhu, Minxi Yan, Haiyang Liu, Siyu Chen, Ziyang Yan, Yilan Meng, Chenyu Zhang, Haotong Qin*, Michele Magno, Yang Yang, Zhen Lei, Yan Wang, Jingcai Guo, Ling Shao, Hao Tang*

In Arxiv, 2024

PDF Code
Arxiv

M²M: Learning Controllable Multi of Experts and Multi-Scale Operators Are the Partial Differential Equations Need

Aoming Liang, Zhaoyang Mu, Pengxiao Lin, Cong Wang, Mingming Ge, Ling Shao, Dixia Fan*, Hao Tang*

In Arxiv, 2024

PDF Code
Arxiv

InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation

Zeyu Zhang, Akide Liu, Qi Chen, Feng Chen, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang*

In Arxiv, 2024

PDF Code
Arxiv

A Survey on Multimodal Wearable Sensor-based Human Action Recognition

Jianyuan Ni, Hao Tang, Syed Tousiful Haque, Yan Yan, Anne HH Ngu

In Arxiv, 2024

PDF Code
Arxiv

StableGarment: Garment-Centric Generation via Stable Diffusion

Rui Wang, Hailong Guo, Jiaming Liu, Huaxia Li, Haibo Zhao, Xu Tang, Yao Hu, Hao Tang, Peipei Li

In Arxiv, 2024

PDF Code
ECCV

EvoVLA: Self-Evolving Vision-Language-Action Model

Zeting Liu, Zida Yang, Zeyu Zhang, Hao Tang*

In ECCV, 2026

PDF Code
ECCV

MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

Ting Huang, Dongjian Li, Rui Yang, Zeyu Zhang, Zida Yang, Hao Tang*

In ECCV, 2026

PDF Code
ECCV

ReMoMask: Retrieval-Augmented Masked Motion Generation

Zhengdao Li, Siheng Wang, Zeyu Zhang, Hao Tang*

In ECCV, 2026

PDF Code
CVPR
Highlight

OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis

Yuxuan Fan, Jing Hao, Hong Chen, Jiahao Bao, Yihua Shao, Yuci Liang, Kuo Feng Hung, Hao Tang*

In CVPR 2026, Denver, USA

PDF Code
CVPR

OralGPT-Omni: A Versatile Dental Multimodal Large Language Model

Jing Hao, Yuci Liang, Lizhuo Lin, Yuxuan Fan, Wenkai Zhou, Kaixin Guo, Zanting Ye, Yanpeng Sun, Xinyu Zhang, Yanqi Yang, Qiankun Li, Hao Tang, James Kit-Hon Tsoi, Linlin Shen, Kuo Feng Hung

In CVPR 2026, Denver, USA

PDF Code
CVPR

PAM: A Pose-Appearance-Motion Engine for Sim-to-Real HOI Video Generation

Mingju Gao, Kaisen Yang, Huan-ang Gao, Bohan Li, Ao Ding, Wenyi Li, Yangcheng Yu, Jinkun Liu, Shaocong Xu, Yike Niu, Haohan Chi, Hao Chen, Hao Tang, Yu Zhang, Li Yi, Hao Zhao

In CVPR 2026, Denver, USA

PDF Code
CVPR

MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing

Xiaokun Sun, Zeyu Cai, Hao Tang, Ying Tai, Jian Yang, Zhenyu Zhang

In CVPR 2026, Denver, USA

PDF Code
AAMAS
Oral

Structured Agent Distillation for Large Language Model Agents

Jun Liu, Zhenglun Kong, Peiyan Dong, Changdi Yang, Tianqi Li, Hao Tang*, Geng Yuan, Wei Niu, Wenbin Zhang, Pu Zhao*, Xue Lin, Dong Huang, Yanzhi Wang

In AAMAS 2026, Paphos, Cyprus

PDF Code
TVCG

DreamBarbie: Text to Barbie-Style 3D Avatars

Xiaokun Sun, Zhenyu Zhang, Ying Tai, Qian Wang, Hao Tang, Zili Yi, Jian Yang

IEEE Transactions on Visualization and Computer Graphics (TVCG), 2026

PDF Code
ICRA

StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes

Zhengri Wu, Yiran Wang, Yu Wen, Zeyu Zhang, Biao Wu, Hao Tang*

In ICRA, 2026, Vienna, Austria

PDF Code
ICLR

VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery

Nonghai Zhang, Zeyu Zhang, Jiazi Wang, Yang Zhao, Hao Tang*

In ICLR, 2026, Rio de Janeiro, Brazil

PDF Code
ICLR
Oral

Hallucination Begins Where Saliency Drops

Xiaofeng Zhang, Yuanchao Zhu, Chaochen Gu, Xiaosong Yuan, Qiyan Zhao, Jiawei Cao, Feilong Tang, Sinan Fan, Yaomin Shen, Chen Shen, Hao Tang

In ICLR, 2026, Rio de Janeiro, Brazil

PDF Code
ICLR

MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling

Haoyu Wang, Hao Tang, Donglin Di, Zhilu Zhang, Wangmeng Zuo, Feng Gao, Siwei Ma, Shiliang Zhang

In ICLR, 2026, Rio de Janeiro, Brazil

PDF Code
ICLR

SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams

Zhuoheng Gao, Yihao Li, Jiyao Zhang, Rui Zhao, Tong Wu, Hao Tang, Zhaofei Yu, Hao Dong, Guozhang Chen, Tiejun Huang

In ICLR, 2026, Rio de Janeiro, Brazil

PDF Code
TPAMI

AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations

Jiawei Mao, Yu Yang, Xuesong Yin, Ling Shao, Hao Tang*

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2026

PDF Code
AAMAS

Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts

Jiaxing Zhang, Hao Tang*

In AAMAS 2026, Paphos, Cyprus

PDF Code
AAAI

ICM-Fusion: In-Context Meta-Optimized LoRA Fusion for Multi-Task Adaptation

Yihua Shao, Xiaofeng Lin, Xinwei Long, Siyu Chen, Minxi Yan, Yang Liu, Ziyang Yan, Ao Ma, Hao Tang, Jingcai Guo

In AAAI 2026, Singapore City, Singapore

PDF Code
AAAI

TR-DQ: Time-Rotation Diffusion Quantization

Yihua Shao, Deyang Lin, Minxi Yan, Siyu Chen, Fanhu Zeng, Minwen Liao, Ao Ma, Ziyang Yan, Haozhe Wang, Yan Wang, Zhi Chen, Xiaofeng Cao, Haotong Qin*, Hao Tang*, Jingcai Guo*

In AAAI 2026, Singapore City, Singapore

PDF Code
3DV

3D Coca: Contrastive Learners Are 3D Captioners

Ting Huang, Zeyu Zhang, Yemin Wang, Hao Tang*

In 3DV 2026, Vancouver, Canada

PDF Code
3DV

GRADRobot: Geometry-Aware Rendering with Articulation and Diffusion for Robot Modeling

Yunlong Li, Boyuan Chen, Chongjie Ye, Bohan Li, Zhaoxi Chen, Shaocong Xu, Hao Tang, Hao Zhao

In 3DV 2026, Vancouver, Canada

PDF Code
IJCV

Multimodal Alignment and Fusion: A Survey

Songtao Li, Hao Tang*

Springer International Journal of Computer Vision (IJCV), 2025

PDF Code
NeurIPS
Spotlight

RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness

Fanhu Zeng, Haiyang Guo, Fei Zhu*, Li Shen, Hao Tang*

In NeurIPS 2025, San Diego, USA

PDF Code
NeurIPS

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis

Jing Hao, Yuxuan Fan, Yanpeng Sun, Kaixin Guo, Lizhuo Lin, Jinrong Yang, Qi Yong H. Ai, Lun M. Wong, Hao Tang*, Kuo Feng Hung*

In NeurIPS 2025, San Diego, USA

PDF Code
NeurIPS

Enhancing Diffusion-based Unrestricted Adversarial Attacks via Adversary Preferences Alignment

Kaixun Jiang, Zhaoyu Chen, HaiJing Guo, Jinglun Li, Jiyuan Fu, Pinxue Guo, Hao Tang, Bo Li, Wenqiang Zhang

In NeurIPS 2025, San Diego, USA

PDF Code
NeurIPS

Boosting Adversarial Transferability with Spatial Adversarial Alignment

Zhaoyu Chen, Haijing Guo, Kaixun Jiang, Jiyuan Fu, Xinyu Zhou, Dingkang Yang, Hao Tang, Bo Li, Wenqiang Zhang

In NeurIPS 2025, San Diego, USA

PDF Code
CoRL

3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation

Xiaoqi Li, Liang Heng, Jiaming Liu, Yan Shen, Chenyang Gu, Zhuoyang Liu, Hao Chen, Nuowei Han, Renrui Zhang, Hao Tang, Shanghang Zhang, Hao Dong

In CoRL 2025, Seoul, Korea

PDF Code
CoRL
Workshop

GRADRobot: Geometry-Aware Rendering with Articulation and Diffusion for Robot Modeling

Yunlong Li, Boyuan Chen, Chongjie Ye, Bohan Li, Zhaoxi Chen, Shaocong Xu, Hao Tang, Hao Zhao

In CoRL 2025, Seoul, Korea

PDF Code
ACM MM

EventVAD: Training-free Event-aware Video Anomaly Detection

Yihua Shao, Haojin He, Sijie Li, Siyu Chen, Xinwei Long, Fanhu Zeng, Yuxuan Fan, Muyang Zhang, Ziyang Yan, Ao Ma, Xiaochen Wang, Hao Tang, Yan Wang, Shuyan Li

In ACM MM 2025, Dublin, Ireland

PDF Code
ICCV

MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation

Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan, Gady Agam

In ICCV 2025, Honolulu, USA

PDF Code
IROS
Oral

TTTFusion: A Test-Time Training-Based Strategy for Multimodal Medical Image Fusion in Surgical Robots

Qinhua Xie, Hao Tang*

In IROS 2025, Hangzhou, China

PDF Code
IROS
Oral

CRUISE: Cooperative Reconstruction and Editing in V2X Scenarios using Gaussian Splatting

Haoran Xu, Saining Zhang, Peishuo Li, Baijun Ye, Xiaoxue Chen, Huan-ang Gao, Jv Zheng, Xiaowei Song, Ziqiao Peng, Run Miao, Jinrang Jia, Yifeng Shi, Guangqi Yi, Hang Zhao, Hao Tang, Hongyang Li, Kaicheng Yu, Hao Zhao

In IROS 2025, Hangzhou, China

PDF Code
IJCV

AutoViT: Achieving Real-Time Vision Transformers on Mobile via Latency-aware Coarse-to-Fine Search

Zhenglun Kong, Dongkuan Xu, Zhengang Li, Peiyan Dong, Hao Tang, Yanzhi Wang, Subhabrata Mukherjee

Springer International Journal of Computer Vision (IJCV), 2025

PDF Code
IJCAI

Semantic-Guided Diffusion Model for Single-Step Image Super-Resolution

Zihang Liu, Zhenyu Zhang, Hao Tang*

In IJCAI 2025, Montreal, Canada

PDF Code
IJCAI

In-Context Meta LoRA Generation

Yihua Shao, Minxi Yan, Yang Liu, Siyu Chen, Wenjie Chen, Xinwei Long, Ziyang Yan, Lei Li, Chenyu Zhang, Nicu Sebe, Hao Tang*, Yan Wang, Hao Zhao, Mengzhu Wang, Jingcai Guo*

In IJCAI 2025, Montreal, Canada

PDF Code
IJCAI

FairSMOE: Mitigating Multi-Attribute Fairness Problem with Sparse Mixture-of-Experts

Changdi Yang, Zheng Zhan, Ci Zhang, Yifan Gong, Yize Li, Zichong Meng, Jun Liu, Xuan Shen, Hao Tang, Geng Yuan, Pu Zhao, Xue Lin, Yanzhi Wang

In IJCAI 2025, Montreal, Canada

PDF Code
Advanced
Science

Smart Organic–Inorganic Copolymer Nanoparticles Distinguish Between Microglia and Cancer Cells for Synergistic Immunotherapy in Glioma

Shiming Zhang, Kun Shang, Lidong Gong, Qian Xie, Jianfei Sun, Meng Xu, Xunbin Wei, Zhaoheng Xie, Xinyu Liu, Hao Tang, Zhengren Xu, Wei Wang, Haihua Xiao, Zhiqiang Lin, Hongbin Han

Advanced Science, 2025

PDF Code
CVPR
Oral

DiffFNO: Diffusion Fourier Neural Operator

Xiaoyi Liu, Hao Tang*

In CVPR 2025, Nashville, USA

PDF Code
CVPR

MambaIC: State Space Models for High-Performance Learned Image Compression

Fanhu Zeng, Hao Tang, Yihua Shao, Siyu Chen, Ling Shao, Yan Wang

In CVPR 2025, Nashville, USA

PDF Code
CVPR

PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model

Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, Hao Zhao

In CVPR 2025, Nashville, USA

PDF Code
ICRA

Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots

Renkai Wu, Xianjin Wang, Pengchen Liang, Zhenyu Zhang, Qing Chang*, Hao Tang*

In ICRA 2025, Atlanta, USA

PDF Code
NAACL

From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks

Xiaofeng Zhang, Yihao Quan, Chen Shen, Xiaosong Yuan, Shaotian Yan, Liang Xie, Wenxiao Wang, Chaochen Gu, Hao Tang, Jieping Ye

In NAACL 2025, Albuquerque, USA

PDF Code
AAAI

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment

Jun Liu, Zhenglun Kong, Pu Zhao, Changdi Yang, Hao Tang*, Xuan Shen, Geng Yuan, Wei Niu, Wenbin Zhang, Xue Lin, Dong Huang*, Yanzhi Wang*

In AAAI 2025, Philadelphia, USA

PDF Code
AAAI

Stable-Hair: Real-World Hair Transfer via Diffusion Model

Yuxuan Zhang, Qing Zhang, Yiren Song, Jichao Zhang, Hao Tang, Jiaming Liu

In AAAI 2025, Philadelphia, USA

PDF Code
AAAI

Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling

Jianan Jiang, Hao Tang*, Zhilin Jiang, Weiren Yu, Di Wu*

In AAAI 2025, Philadelphia, USA

PDF Code
NeurIPS

Revisiting Adversarial Patches for Designing Camera-Agnostic Attacks against Person Detection

Hui Wei, Zhixiang Wang, Kewei Zhang, Jiaqi Hou, Yuanwei Liu, Hao Tang, Zheng Wang

In NeurIPS 2024, Vancouver, Canada

PDF Code
ACM MM
Oral

ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance

Haijie Yang, Zhenyu Zhang, Hao Tang, Jianjun Qian, Jian Yang

In ACM MM 2024, Melbourne, Australia

PDF Code
ACM MM

CoIn: A Lightweight and Effective Framework for Story Visualization and Continuation

Ming Tao, Bao Bingkun, Hao Tang, Yaowei Wang, Changsheng Xu

In ACM MM 2024, Melbourne, Australia

PDF Code
TPAMI

Physical Adversarial Attack Meets Computer Vision: A Decade Survey

Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin'ichi Satoh, Luc Van Gool, Zheng Wang

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

PDF Code
ECCV

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang*

In ECCV 2024, Milan, Italy

PDF Code
ECCV

3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma, Hao Tang, Nicu Sebe, Xu Wang

In ECCV 2024, Milan, Italy

PDF Code
ECCV

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

Ming Tao, Bingkun Bao, Hao Tang, Yaowei Wang, Changsheng Xu

In ECCV 2024, Milan, Italy

PDF Code
ECCV

InstructGIE: Towards Generalizable Image Editing

Zichong Meng, Changdi Yang, Jun Liu, Hao Tang*, Pu Zhao*, Yanzhi Wang*

In ECCV 2024, Milan, Italy

PDF Code
ECCV

SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior

Huan-ang Gao, Mingju Gao, Jiaju Li, Wenyi Li, Rong Zhi, Hao Tang, Hao Zhao

In ECCV 2024, Milan, Italy

PDF Code
ECCV

Dataset Growth

Ziheng Qin, Zhaopan Xu, Yukun Zhou, Zangwei Zheng, Zebang Cheng, Hao Tang, Lei Shang, Baigui Sun, Xiaojiang Peng, Radu Timofte, Hongxun Yao, Kai Wang, Yang You

In ECCV 2024, Milan, Italy

PDF Code
CVPR
Highlight

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

Wencan Cheng, Hao Tang, Luc Van Gool, Jong Hwan Ko

In CVPR 2024, Seattle, USA

PDF Code
CVPR

Versatile Navigation under Partial Observability via Value-guided Diffusion Policy

Gengyu Zhang, Hao Tang, Yan Yan

In CVPR 2024, Seattle, USA

PDF Code
CVPR

Towards Robust 3D Pose Transfer with Adversarial Learning

Haoyu Chen, Hao Tang, Ehsan Adeli, Guoying Zhao

In CVPR 2024, Seattle, USA

PDF Code
CVPR

On the Faithfulness of Vision Transformer Explanations

Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan

In CVPR 2024, Seattle, USA

PDF Code
CVPR

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan

In CVPR 2024, Seattle, USA

PDF Code
CVPR

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Yuxuan Zhang, Jiaming Liu, Yiren Song, Rui Wang, Hao Tang, Jinpeng Yu, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing

In CVPR 2024, Seattle, USA

PDF Code
CVPR
Workshop

Towards Online Real-Time Memory-based Video Inpainting Transformers

Guillaume Thiry, Hao Tang*, Radu Timofte, Luc Van Gool

In CVPR 2024, Seattle, USA

PDF Code
AAAI

G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

Pan Xie, Qipeng Zhang, Peng Taiying, Hao Tang*, Yao Du, Zexian Li

In AAAI 2024, Vancouver, Canada

PDF Code
NeurIPS

HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception

Peiyan Dong, Zhenglun Kong, Xin Meng, Pinrui Yu, Yifan Gong, Geng Yuan, Hao Tang*, Yanzhi Wang

In NeurIPS 2023, New Orleans, USA

PDF Code
NeurIPS

PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile

Peiyan Dong, Lei Lu, Chao Wu, Cheng Lyu, Geng Yuan, Hao Tang*, Yanzhi Wang

In NeurIPS 2023, New Orleans, USA

PDF Code
NeurIPS

LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer

Haoyu Chen, Hao Tang, Radu Timofte, Luc Van Gool, Guoying Zhao

In NeurIPS 2023, New Orleans, USA

PDF Code
NeurIPS

Does Graph Distillation See Like Vision Dataset Counterpart?

Beining Yang, Kai Wang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Hao Tang, Yang You, Jianxin Li

In NeurIPS 2023, New Orleans, USA

PDF Code
MIR

Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis

Kai Zhang, Yawei Li, Jingyun Liang, Jiezhang Cao, Yulun Zhang, Hao Tang, Dengping Fan, Radu Timofte, Luc Van Gool

Springer Machine Intelligence Research (MIR), 2023

PDF Code
ICCV

Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification

Jianbing Wu, Hong Liu, Yuxin Su, Wei Shi, Hao Tang

In ICCV 2023, Paris, France

PDF Code
ICML

SpeedDETR: Speed-aware Transformers for End-to-end Object Detection

Peiyan Dong, Zhenglun Kong, Xin Meng, Peng Zhang, Hao Tang*, Yanzhi Wang, Chih-Hsien Chou

In ICML 2023, Hawaii, USA

PDF Code
IJCAI

Data Level Lottery Ticket Hypothesis for Vision Transformers

Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang

In IJCAI 2023, Macao, China

PDF Code
CVPR

Graph Transformer GANs for Graph-Constrained House Generation

Hao Tang, Zhenyu Zhang, Humphrey Shi, Bo Li, Ling Shao, Nicu Sebe, Radu Timofte, Luc Van Gool

In CVPR 2023, Vancouver, Canada

PDF Code
CVPR

Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration

Guofeng Mei, Hao Tang, Xiaoshui Huang, Weijie Wang, Juan Liu, Jian Zhang, Luc Van Gool, Qiang Wu

In CVPR 2023, Vancouver, Canada

PDF Code
CVPR

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Xuan Shen, Yaohua Wang, Ming Lin, Yilun Huang, Hao Tang, Xiuyu Sun, Yanzhi Wang

In CVPR 2023, Vancouver, Canada

PDF Code
CVPR

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

Ming Tao, Bingkun Bao, Hao Tang, Changsheng Xu

In CVPR 2023, Vancouver, Canada

PDF Code
ICLR

Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis

Hao Tang, Xiaojuan Qi, Guolei Sun, Dan Xu, Nicu Sebe, Radu Timofte, Luc Van Gool

In ICLR 2023, Kigali, Rwanda

PDF Code
ECCV

3D-Aware Semantic-Guided Generative Model for Human Synthesis

Jichao Zhang, Enver Sangineto, Hao Tang, Aliaksandr Siarohin, Zhun Zhong, Nicu Sebe, Wei Wang

In ECCV 2022, Tel Aviv, Israel

PDF Code
ECCV

Towards Interpretable Video Super-Resolution via Alternative Optimization

Jiezhang Cao, Jingyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, Luc Van Gool

In ECCV 2022, Tel Aviv, Israel

PDF Code
CVPR

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc Van Gool

In CVPR 2022, New Orleans, USA

PDF Code
CVPR
Oral

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

Ming Tao, Hao Tang, Fei Wu, Xiaoyuan Jing, Bingkun Bao, Changsheng Xu

In CVPR 2022, New Orleans, USA

PDF Code